Module 10 - Programming Assignment

This is the notebook for the Module 10 Programming Assignment.

Here are a few tips for using the iPython HTML notebook:

You can use tab . Try le<<tab> and see the available functions.
You can change the type of cell by picking "Code" or "Markdown" from the menu at the left.
If you keep typing in a Markdown text area, you will eventually get scroll bars. To prevent this, hit return when you come to the end of the window. Only a double return creates a new paragraph.
You can find out more about Markdown text here: http://daringfireball.net/projects/markdown/ (Copy this link and put it in another tab for reference--Don't click it or you'll leave your notebook!).
Every so often, restart the kernel, clear all output and run all code cells so you can be certain that you didn't define something out of order.

You should rename this notebook to be <your JHED id>_PR10.ipynb Do it right now.

Make certain the entire notebook executes before you submit it. (See Hint #5 above)

Change the following variables:



In [1]:

    
name = "Shehzad Qureshi"
jhed_id = "squresh6"
if name == "Student Name" or jhed_id == "sname1":
    raise Exception( "Change the name and/or JHED ID...preferrably to yours.")

Add whatever additional imports you require here. Stick with the standard libraries and those required by the class. The import gives you access to these functions: http://ipython.org/ipython-doc/stable/api/generated/IPython.core.display.html (Copy this link) Which, among other things, will permit you to display HTML as the result of evaluated code (see HTML() or display_html()).



In [2]:

    
from IPython.core.display import *
from StringIO import StringIO
import numpy as np

In this programming assignment you will implement a feed forward neural network (multi-layer perceptron) and the back propagation algorithm. You only need one hidden layer but the number of input nodes, hidden nodes and output nodes should be configurable.

You will apply your neural network to a regression task using the accompanying file: neural_network_regression.csv. The function you're learning looks like this:



In [3]:

    
Image( filename="neural_network_regression.png")









    Out[3]:

which is clearly not learnable as a simple linear regression.

You're going to have to try a few different numbers of nodes in the hidden layer. What you really should do is put the evaluation techniques we learned last week to good use but that might be a bit much to expect in one week. If you can do it, you should. Otherwise, guestimate.

For simplicity, you should assume that you have a train function that takes parameters and spits out some representation of the neural network. That representation is entirely up to you.

The second function is output. It should take your network representation, an instance and emit y_hat. If you are using test data, this means you need to supply only the "xs" part to the output function and then compare y_hat and y.

Don't forget that you need to add $x_0=1$ everywhere (the "biases") inside the network...every node really is just a mini-logistic regression. I've also started you out with some default parameter values. You may need to change them.

Normally, you'd have to apply mean normalization which would add a whole other layer of parameters to store in the network Dict but you don't have to here, because the values of x1, x2 and y are already on the interval (0, 1) so you don't have to worry about that for this assignment.

Aritifical Neural Networks

Let's start by defining what the network looks like. We need to keep track of the hidden and output theta values, therefore we need two lists.

List 1 will represent a list of hidden node theta values
List 2 will represent a list of output node theta values

Each list will have at least one theta value and a bias to start with. All values are initialized to some random number in the interval [0, 1)



In [4]:

    
def define_network(num_input, num_hidden, num_output):
    hidden = np.random.random((num_hidden, num_input + 1))
    output = np.random.random((num_output, num_hidden + 1))
    return {"hidden" : hidden, "output" : output}



In [5]:

    
test_network = define_network(2, 2, 1)
test_network









    Out[5]:





{'hidden': array([[ 0.46332822,  0.61982882,  0.27448695],
        [ 0.41652346,  0.00799675,  0.90755639]]),
 'output': array([[ 0.77768316,  0.96585308,  0.62854403]])}

Let's add a way to prettify the network ooutput. At the very least make it a bit more readable.



In [6]:

    
def print_network(network):
    print "\nOutput:"
    for x, row in enumerate(network["output"]):
        for y, col in enumerate(row):
            print "T[{0}][{1}] = {2}".format(x, y, col)
        print
    print "Hidden:"
    for x, row in enumerate(network["hidden"]):
        for y, col in enumerate(row):
            print "T[{0}][{1}] = {2}".format(x, y, col)
        print



In [7]:

    
print_network(test_network)









    



Output:
T[0][0] = 0.777683161658
T[0][1] = 0.965853081339
T[0][2] = 0.628544026531

Hidden:
T[0][0] = 0.463328217697
T[0][1] = 0.619828816316
T[0][2] = 0.27448695392

T[1][0] = 0.416523459439
T[1][1] = 0.00799674762378
T[1][2] = 0.907556385633

We need to split the incoming data into input and output. We shall insert an initial bias of $1.0$ to use as $x_0$.



In [8]:

    
def split_data(dataset, num_inputs):
    x, y = np.array_split(dataset, num_inputs, np.shape(dataset)[1] - num_inputs)
    ones = np.ones((x.shape[0], 1))
    x = np.column_stack((ones, x))
    return x, y



In [9]:

    
dataset = np.loadtxt('neural_network_regression.csv', delimiter=',')
test_x, test_y = split_data(dataset, 2)
print test_x.shape, test_y.shape









    



(250L, 3L) (250L, 1L)

This is our sigmoid function. It works on single values or arrays.

$\hat{y} = {1 \over {1 + e^-z}}$

Where $z$ is defined as:

$z = \sum{\theta_i \cdot x_i}$



In [10]:

    
def sigmoid(z):
    return 1 / (1 + np.exp(-z))



In [11]:

    
print sigmoid(0.1) # = 0.5249
print sigmoid(0.5) # 0.6224
print sigmoid(np.array([0.1, 0.5, -0.162, 0.2408]))









    



0.524979187479
0.622459331202
[ 0.52497919  0.62245933  0.45958834  0.55991079]

Now we need to calculate the output of the network, at the output nodes and the hidden nodes as well. We add a bias value of 1.0 to $y_i^H$ to complete the matrix calculation.



In [12]:

    
def calculate_output(network, x):
    x1 = np.asmatrix(x)
    y_hidden = sigmoid(np.dot(x1, network["hidden"].T))
    y_hidden_bias = np.insert(y_hidden, 0, 1, axis=1)
    y_output = sigmoid(np.dot(y_hidden_bias, network["output"].T))
    return np.asarray(y_hidden), np.asarray(y_output)



In [13]:

    
test_y_h, test_y_o = calculate_output(test_network, test_x)
print test_y_h[0], test_y_o[0]









    



[ 0.49685473  0.44292301] [ 0.82287789]

Now we need to calculate the error in the output. This can be done directly since $y$ is given.

$\delta^o = \hat y \cdot (1 - \hat y) \cdot (y - \hat y)$

where $\hat y$ is the calculated output and $y$ is the truth.



In [14]:

    
def calculate_delta_output(y, y_h):
    d_o = y_h * (1 - y_h) * (y - y_h)
    return np.asarray(d_o)



In [15]:

    
test_d_o = calculate_delta_output(test_y, test_y_o)
print test_d_o[0]









    



[-0.08033563]

We also need to calculate the error in the hidden nodes as well. This is done by weighting the error for each node as seen in the output.

$\delta_i^H = \hat y_i \cdot (1 - \hat y_i) \cdot \sum {\theta_i^o \cdot \delta_i^o}$

where $\hat y_i$ is the output of each hidden node, $\theta_i^o$ is the theta value of each connection to the outputs, $\delta_i^o$ is the error from that output.



In [16]:

    
def calculate_delta_hidden(theta_o, y_h, d_o):
    y = y_h.T
    t = np.delete(theta_o, 0, axis=1).T
    d_h = [y[i] * (1 - y[i]) * np.sum(t[i] * d_o) for i, _ in enumerate(y)]
    return np.asarray(d_h).T



In [17]:

    
test_d_h = calculate_delta_hidden(test_network["hidden"], test_y_h, test_d_o)
test_d_h[0]









    Out[17]:





array([-0.94751127, -1.76075702])

This function will update our theta values for the network, given all necessary parameters. The update equations are:

$\theta_i^o = \theta_i^o + \delta_i^o \cdot y_i^H$

$\theta_i^H = \theta_i^H + \delta_i^H \cdot x_i$



In [18]:

    
def update_theta(network, d_o, d_h, y_h, x, alpha):
    x1 = np.atleast_2d(x)
    y_h_b = np.insert(np.atleast_2d(y_h), 0, 1, axis=1)
    d_h_b = np.insert(np.atleast_2d(d_h), 0, 1, axis=1)
    output = np.array([n + y_h_b * d_o * alpha for n in network["output"]])
    hidden = np.array([n + x1 * d_h_b * alpha for n in network["hidden"]])
    output = np.squeeze(output, axis=(1,))
    hidden = np.squeeze(hidden, axis=(1,))
    #return np.squeeze(np.array([hidden, output]))
    return {"hidden" : hidden, "output" : output}



In [19]:

    
new_network = update_theta(test_network, test_d_o[0], test_d_h[0], test_y_h[0], test_x[0], 0.05)
print_network(new_network)









    



Output:
T[0][0] = 0.773666379921
T[0][1] = 0.963857324341
T[0][2] = 0.626764901472

Hidden:
T[0][0] = 0.513328217697
T[0][1] = 0.641358510234
T[0][2] = 0.336783502503

T[1][0] = 0.466523459439
T[1][1] = 0.0295264415418
T[1][2] = 0.969852934216

The train function takes in a dataset, a parameter set of number of input, hidden and output nodes, as well as epsilon (for convergence checking) and alpha (learning rate) values.

After defining the network and splitting the data as necessary, it performs the following operations:

Calculates the output of the network
Calculates the error of hidden and output nodes
Updates the theta values of the network nodes
Calculates the error and checks for convergence

The first two are vectorized using Numpy. The third cannot be vectorized since it is a recursive function dependent on the previous theta value.



In [20]:

    
def train(dataset, num_inputs, num_hidden, num_output, epsilon=0.0000001, alpha=0.05, debug=True):
    network = define_network(num_inputs, num_hidden, num_output)
    x, y = split_data(dataset, num_inputs)
    previous_error, current_error = 0, 1
    while np.abs(current_error - previous_error) > epsilon:
        y_hidden, y_output = calculate_output(network, x)
        delta_output = calculate_delta_output(y, y_output)
        delta_hidden = calculate_delta_hidden(network["output"], y_hidden, delta_output)
        for i, _ in enumerate(dataset):
            network = update_theta(network, delta_output[i], delta_hidden[i], y_hidden[i], x[i], alpha)
        previous_error, current_error = current_error, np.mean(y_output - y)
        if debug: print "Error:", np.abs(previous_error - current_error)
    return network

Now let's test our network with the given data. We can assume that $2^n / n = 2$ is a good start for the number of hidden nodes in the network.



In [21]:

    
dataset = np.loadtxt("neural_network_regression.csv", delimiter=',')
network = train(dataset, 2, 2, 1)
print_network(network)









    



Error: 0.966067070868
Error: 0.00200573873633
Error: 0.0528909055358
Error: 0.0385394219638
Error: 0.0301376693171
Error: 0.0228056927497
Error: 0.0178435600793
Error: 0.0136862403044
Error: 0.0106910490516
Error: 0.00825000568614
Error: 0.00643365691233
Error: 0.0049797781709
Error: 0.00387841703332
Error: 0.00300686122572
Error: 0.00233978618289
Error: 0.00181564522038
Error: 0.00141203938012
Error: 0.00109630092277
Error: 0.000852295849385
Error: 0.000661922559625
Error: 0.00051448392327
Error: 0.000399639135234
Error: 0.000310580274988
Error: 0.000241277871367
Error: 0.000187494085049
Error: 0.0001456664838
Error: 0.000113189985312
Error: 8.79421690862e-05
Error: 6.83332995142e-05
Error: 5.30923394586e-05
Error: 4.125334113e-05
Error: 3.20527140645e-05
Error: 2.49050414226e-05
Error: 1.93507006941e-05
Error: 1.50354437762e-05
Error: 1.16822888027e-05
Error: 9.07707116017e-06
Error: 7.05275490555e-06
Error: 5.47993660582e-06
Error: 4.25784070377e-06
Error: 3.30830478232e-06
Error: 2.57051347396e-06
Error: 1.99726458795e-06
Error: 1.55185190565e-06
Error: 1.20577356388e-06
Error: 9.36872758259e-07
Error: 7.27940620336e-07
Error: 5.6560197458e-07
Error: 4.39466905634e-07
Error: 3.41461075852e-07
Error: 2.65311705586e-07
Error: 2.06144370792e-07
Error: 1.6017202121e-07
Error: 1.24451962414e-07
Error: 9.66978699548e-08

Output:
T[0][0] = 0.607317467651
T[0][1] = 0.104680349807
T[0][2] = 0.0406995289682

Hidden:
T[0][0] = 688.495268999
T[0][1] = 1.06063979087
T[0][2] = 0.0227695446208

T[1][0] = 687.761526824
T[1][1] = 0.742875270041
T[1][2] = 0.807385836202

The output function will take a neural network as defined above and an example and return the calculated output of the network. This will be used for validation.



In [22]:

    
def output(network, instance):
    return calculate_output(network, instance)[1]



In [23]:

    
x, y, y_h = [], [], []
for i in xrange(5):
    x.append(np.insert(dataset[i][:-1], 0, 1))
    y.append(dataset[i][-1:])
for i in xrange(5):
    y_h.extend(output(network, x[i]))
print "Error:", np.abs(np.subtract(y_h, y))









    



Error: [[ 0.40807665]
 [ 0.38718392]
 [ 0.05127389]
 [ 0.05625551]
 [ 0.40960062]]

Now we need to evalute the model. We will do so using a 67/33 split for training and validation. We will do this with 2 hidden nodes.



In [24]:

    
def partition(dataset, fold, k):
    segments = np.array_split(dataset, k)
    test = segments[fold-1]
    training = [segments[i] for i in xrange(k) if i != (fold-1)]
    return np.concatenate(training), test



In [25]:

    
dataset = np.loadtxt('neural_network_regression.csv', delimiter=',')
error = []
for i in xrange(10):
    np.random.shuffle(dataset)
    training, validation = partition(dataset, 3, 3)
    network = train(training, 2, 2, 1, debug=False)
    x, y = split_data(validation, 2)
    y_h = np.array([output(network, xval) for xval in x])
    mse = np.average((y_h - y)**2)
    error.append(mse)
print "Mean Square Error:", np.average(error)









    



Mean Square Error: 0.0624264711319

Now we'll calculate the mean square error using 10-fold cross validtion.



In [26]:

    
def cross_validation(dataset, n, folds=10):
    error = []
    for i in xrange(1, folds+1):
        training_set, test_set = partition(dataset, i, folds)
        network = train(dataset, 2, n, 1, debug=False)
        x, y = split_data(validation, 2)
        y_h = np.array([output(network, xval) for xval in x])
        error.append(np.average((y_h - y)**2))
    return np.average(error)



In [27]:

    
dataset = np.loadtxt('neural_network_regression.csv', delimiter=',')
error = cross_validation(dataset, 2)
print "Average MSE:", error









    



Average MSE: 0.0639949169618

Looks like the average error is roughly the same.

Summary

Artificial Neural Networks performs a similar function as Regression in the Machine Learning realm. It attemps to fit a line to the data by trying to recognize trends in data. Instead of adjusting the variables bit by bit and adjusting as necessary, Artificial Neural Networks work by back-propagating the error in the output layer to the hidden layers. The variables are then adjusting accordingly until convergence.

Artificial Neural Networks tend to be prone to overfitting though. It seems like it fits too perfectly and consequently the validation set has more error.

I managed to vectorize most of the calculations, including output and error calculations. Updating the theta required iteration because the function was non-linear, i.e. depended on the previous value of theta. I couldn't figure out how to get Numpy to do that although I'm not sure it's possible.

This particular assignment was a frustrating attempt to try and generalize matrix calculations. I did my best to generalize the formulas for calculating output and error but it seems like there was always a test case that would break the generalization. What worked on paper didn't necessary work well in code either.

I ended up sticking to a value of 2 for the number of hidden nodes which seems to work well enough. Suprisingly it converges very quickly so I question whether I'm actually doing it right.



In [ ]: